Deposition and Extension Approach to Find Longest Common Subsequence for Multiple Sequences

نویسنده

  • Kang Ning
چکیده

The problem of finding the longest common subsequence (LCS) for a set of sequences is a very interesting and challenging problem in computer science. This problem is NPcomplete, but because of its importance, many heuristic algorithms have been proposed, such as Long Run algorithm and Expansion algorithm. However, the performance of many current heuristic algorithms deteriorates fast when the number of sequences and sequence length increase. In this paper, we have proposed a post process heuristic algorithm for the LCS problem, the Deposition and Extension algorithm (DEA). This algorithm first generates common subsequence by the process of sequences deposition, and then extends this common subsequence. The algorithm is proven to generate Common Subsequences (CSs) with guaranteed lengths. The experiments show that the results of DEA algorithm are better than those of Long Run and Expansion algorithm, especially on many long sequences. The algorithm also has superior efficiency both in time and space.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Index-Based Parallel Algorithm for finding Longest Common Subsequence in Multiple DNA Sequences

This paper presents a new Parallel Algorithm for computing a Longest Common Subsequence in Multiple DNA Sequences. It uses a heuristic approach. Although a lot of research has been carried out to find LCS from the two or more given sequences of Protein, DNA, RNA etc, but not many parallel methods exists for finding LCS from multiple sequences. Normally in existing algorithms the time complexity...

متن کامل

Development of Cache Oblivious Based Fast Multiple Longest Common Subsequence Technique(CMLCS) for Biological Sequences Prediction

A biological sequence is a single, continuous molecule of nucleic acid or protein. Classical methods for the Multiple Longest Common Subsequence problem (MLCS) problem are based on dynamic programming. The Multiple Longest Common Subsequence problem (MLCS) is used to find the longest subsequence shared between two or more strings. For over 30 years, significant efforts have been made to find ef...

متن کامل

The Longest Common Subsequence Problem

Algorithms on sequences of symbols have been studied for a long time and now form a fundamental part of computer science. One of the very important problems in analysis of sequences is the longest common subsequence problem. For the general case of an arbitrary number of input sequences, the problem is NP-hard. We describe an approach to solve this problem. This approach is based on constructin...

متن کامل

All Common Subsequences

Time series data abounds in real world problems. Measuring the similarity of time series is a key to solving these problems. One state of the art measure is the longest common subsequence. This measure advocates using the length of the longest common subsequence as an indication of similarity between sequences, but ignores information contained in the second, third, ..., longest subsequences. I...

متن کامل

Code Similarity Detection in Multiple Large Source Trees using Token Hashes

The ability to find similarities between two source code bases, or within one code base, has many uses including the detection of student plagiarism, the identification of intellectual property violations and the location of repeated code in a code base amenable to refactoring. Previous structure-metric approaches have used either suffix trees or modified Longest Common Subsequence algorithms t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009